261 research outputs found
A morphological approach for segmentation and tracking of human faces
A new technique for segmenting and tracking human faces in video sequences is presented. The technique relies on morphological tools such as using connected operators to extract the connected component that more likely belongs to a face, and partition projection to track this component through the sequence. A binary partition tree (BPT) is used to implement the connected operator. The BPT is constructed based on the chrominance criteria and its nodes are analyzed so that the selected node maximizes an estimation of the likelihood of being part of a face. The tracking is performed using a partition projection approach. Images are divided into face and non-face parts, which are tracked through the sequence. The technique has been successfully assessed using several test sequences from the MPEG-4 (raw format) and the MPEG-7 databases (MPEG-1 format).Peer ReviewedPostprint (published version
Saliency maps on image hierarchies
© 2015 Elsevier B.V. All rights reserved.
In this paper we propose two saliency models for salient object segmentation based on a hierarchical image segmentation, a tree-like structure that represents regions at different scales from the details to the whole image (e.g. gPb-UCM, BPT). The first model is based on a hierarchy of image partitions. The saliency at each level is computed on a region basis, taking into account the contrast between regions. The maps obtained for the different partitions are then integrated into a final saliency map. The second model directly works on the structure created by the segmentation algorithm, computing saliency at each node and integrating these cues in a straightforward manner into a single saliency map. We show that the proposed models produce high quality saliency maps. Objective evaluation demonstrates that the two methods achieve state-of-the-art performance in several benchmark datasets.Peer ReviewedPostprint (author's final draft
3D Convolutional Neural Networks for Brain Tumor Segmentation: A Comparison of Multi-resolution Architectures
This paper analyzes the use of 3D Convolutional Neural Networks for brain
tumor segmentation in MR images. We address the problem using three different
architectures that combine fine and coarse features to obtain the final
segmentation. We compare three different networks that use multi-resolution
features in terms of both design and performance and we show that they improve
their single-resolution counterparts
Action tube extraction based 3D-CNN for RGB-D action recognition
In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial tube extraction and temporal sampling. The first part is built upon MobileNet-SSD and its role is to define the spatial region where the action takes place. The second part is based on the structural similarity index (SSIM) and is designed to remove frames without obvious motion from the primary action tube. The final extracted action tube has two benefits: 1) a higher ratio of ROI (subjects of action) to background; 2) most frames contain obvious motion change. We propose to use a two-stream (RGB and Depth) I3D architecture as our 3D-CNN model. Our approach outperforms the state-of-the-art methods on the OA and NTU RGB-D datasets. © 2018 IEEE.Peer ReviewedPostprint (published version
Layer-wise training for self-supervised learning on graphs
End-to-end training of graph neural networks (GNN) on large graphs presents
several memory and computational challenges, and limits the application to
shallow architectures as depth exponentially increases the memory and space
complexities. In this manuscript, we propose Layer-wise Regularized Graph
Infomax, an algorithm to train GNNs layer by layer in a self-supervised manner.
We decouple the feature propagation and feature transformation carried out by
GNNs to learn node representations in order to derive a loss function based on
the prediction of future inputs. We evaluate the algorithm in inductive large
graphs and show similar performance to other end to end methods and a
substantially increased efficiency, which enables the training of more
sophisticated models in one single device. We also show that our algorithm
avoids the oversmoothing of the representations, another common challenge of
deep GNNs
Feature propagation as self-supervision signals on graphs
Self-supervised learning is gaining considerable attention as a solution to
avoid the requirement of extensive annotations in representation learning on
graphs. Current algorithms are based on contrastive learning, which is
computation an memory expensive, and the assumption of invariance under certain
graph augmentations. However, graph transformations such as edge sampling may
modify the semantics of the data so that the iinvariance assumption may be
incorrect. We introduce Regularized Graph Infomax (RGI), a simple yet effective
framework for node level self-supervised learning that trains a graph neural
network encoder by maximizing the mutual information between output node
embeddings and their propagation through the graph, which encode the nodes'
local and global context, respectively. RGI do not use graph data augmentations
but instead generates self-supervision signals with feature propagation, is
non-contrastive and does not depend on a two branch architecture. We run RGI on
both transductive and inductive settings with popular graph benchmarks and show
that it can achieve state-of-the-art performance regardless of its simplicity.Comment: 16 pages, 1 figure, preprin
Monte-Carlo sampling applied to multiple instance learning for whole slide image classification
In this paper we propose a patch sampling strategy based on sequential Monte-Carlo methods for Whole Slide Image classification in the context of Multiple Instance Learning and show its capability to achieve high generalization performance on the differentiation between sun exposed and not sun exposed pieces of skin tissue.Postprint (published version
Brain MRI super-resolution using generative adversarial networks
In this work we propose an adversarial learning approach to generate high resolution MRI scans from low resolution images. The architecture, based on the SRGAN model, adopts 3D convolutions to exploit volumetric information. For the discriminator, the adversarial loss uses least squares in order to stabilize the training. For the generator, the loss function is a combination of a least squares adversarial loss and a content term based on mean square error and image gradients in order to improve the quality of the generated images. We explore different solutions for the up sampling phase. We present promising results that improve classical interpolation, showing the potential of the approach for 3D medical imaging super-resolution.Postprint (published version
Picking groups instead of samples: a close look at Static Pool-based Meta-Active Learning
©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Active Learning techniques are used to tackle learning problems where obtaining training labels is costly. In this work we use Meta-Active Learning to learn to select a subset of samples from a pool of unsupervised input for further annotation. This scenario is called Static Pool-based Meta-Active Learning. We propose to extend existing approaches by performing the selection in a manner that, unlike previous works, can handle the selection of each sample based on the whole selected subset.Peer ReviewedPostprint (author's final draft
BCN20000: dermoscopic lesions in the wild
This article summarizes the BCN20000 dataset, composed of 19424 dermoscopic images of skin lesions captured from 2010 to 2016 in the facilities of the Hospital Clínic in Barcelona. With this dataset, we aim to study the problem of unconstrained classification of dermoscopic images of skin cancer, including lesions found in hard-to-diagnose locations (nails and mucosa), large lesions which do not fit in the aperture of the dermoscopy device, and hypo-pigmented lesions. The BCN20000 will be provided to the participants of the ISIC Challenge 2019 [8], where they will be asked to train algorithms to classify dermoscopic images of skin cancer automatically.Peer ReviewedPreprin
- …